Search CORE

30 research outputs found

Joshua 3.0: Syntax-based Machine Translation with the Thrax Grammar Extractor

Author: Callison-Burch Chris
Ganitkevitch Juri
Lopez Adam
Post Matt
Weese Jonathan
Publication venue
Publication date: 01/07/2011
Field of study

We present progress on Joshua, an opensource decoder for hierarchical and syntaxbased machine translation. The main focus is describing Thrax, a flexible, open source synchronous context-free grammar extractor. Thrax extracts both hierarchical (Chiang, 2007) and syntax-augmented machine translation (Zollmann and Venugopal, 2006) grammars. It is built on Apache Hadoop for efficient distributed performance, and can easily be extended with support for new grammars, feature functions, and output formats.

CiteSeerX

Edinburgh Research Explorer

Genie: A Generator of Natural Language Semantic Parsers for Virtual Assistant Commands

Author: Alvarez-Melis David
Banarescu Laura
Chen David L
Chu Shumo
Ganitkevitch Juri
Kate Rohit J
Kingma Diederik P
Pasupat Panupong
Quirk Chris
Shetty Jitesh
Steedman Mark
Trakhtenbrot Boris A.
Wang Yushi
Wong Yuk Wah
Xu Xiaojun
Zelle John M
Zettlemoyer Luke S
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 18/04/2019
Field of study

To understand diverse natural language commands, virtual assistants today are trained with numerous labor-intensive, manually annotated sentences. This paper presents a methodology and the Genie toolkit that can handle new compound commands with significantly less manual effort. We advocate formalizing the capability of virtual assistants with a Virtual Assistant Programming Language (VAPL) and using a neural semantic parser to translate natural language into VAPL code. Genie needs only a small realistic set of input sentences for validating the neural model. Developers write templates to synthesize data; Genie uses crowdsourced paraphrases and data augmentation, along with the synthesized data, to train a semantic parser. We also propose design principles that make VAPL languages amenable to natural language translation. We apply these principles to revise ThingTalk, the language used by the Almond virtual assistant. We use Genie to build the first semantic parser that can support compound virtual assistants commands with unquoted free-form parameters. Genie achieves a 62% accuracy on realistic user inputs. We demonstrate Genie's generality by showing a 19% and 31% improvement over the previous state of the art on a music skill, aggregate functions, and access control.Comment: To appear in PLDI 201

arXiv.org e-Print Archive

Crossref

cdec: A Decoder, Alignment, and Learning Framework for Finite-State and Context-Free Translation Models

Author: Blunsom Phil
Dyer Chris
Eidelman Vladimir
Ganitkevitch Juri
Lopez Adam
Resnik Philip
Setiawan Hendra
Ture Ferhan
Weese Jonathan
Publication venue
Publication date: 01/01/2010
Field of study

We present cdec, an open source framework for decoding, aligning with, and training a number of statistical machine translation models, including word-based models, phrase-based models, and models based on synchronous context-free grammars. Using a single unified internal representation for translation forests, the decoder strictly separates model-specific translation logic from general rescoring, pruning, and inference algorithms. From this unified representation, the decoder can extract not only the 1- or k-best translations, but also alignments to a reference, or the quantities necessary to drive discriminative training using gradient-based or gradient-free optimization techniques. Its efficient C++ implementation means that memory use and runtime performance are significantly better than comparable decoders.

CiteSeerX

Edinburgh Research Explorer

Oxford University Research Archive

Learning to Translate with Products of Novices: A Suite of Open-Ended Challenge Problems for Teaching MT

Author: Ahmidi Narges
Buzek Olivia
Callison-Burch Chris
Ganitkevitch Juri
Hanson Leah
Jamil Beenish
Lee Matthias
Lin Ya-Ting
Lopez Adam
Pao Henry
Post Matt
Rivera Fatima
Shahriyari Leili
Sinha Debu
Teichert Adam
Wampler Stephen
Weese Jonathan
Weinberger Michael
Xu Daguang
Yang Lin
Zhao Shang
Publication venue
Publication date: 01/01/2013
Field of study

Machine translation (MT) draws from several different disciplines, making it a complex subject to teach. There are excellent pedagogical texts, but problems in MT and current algorithms for solving them are best learned by doing. As a centerpiece of our MT course, we devised a series of open-ended challenges for students in which the goal was to improve performance on carefully constrained instances of four key MT tasks: alignment, decoding, evaluation, and reranking. Students brought a diverse set of techniques to the problems, including some novel solutions which performed remarkably well. A surprising and exciting outcome was that student solutions or their combinations fared competitively on some tasks, demonstrating that even newcomers to the field can help improve the state-ofthe-art on hard NLP problems while simultaneously learning a great deal. The problems, baseline code, and results are freely available.

CiteSeerX

Edinburgh Research Explorer

Data-driven sentence simplification: Survey and benchmark

Author: Abend Omri
Alva-Manchego Fernando
Alva-Manchego Fernando
Artetxe Mikel
Bach Nguyen
Bahdanau Dzmitry
Bingel Joachim
Biran Or
Bott Stefan
Bott Stefan
Bott Stefan
Carolina Scarton
Carroll John
Caseli Helena M.
Coster William
Coster William
Damay Jerwin Jan S.
De Belder Jan
Devlin Siobhan
Eom Soojeong
Feblowitz Dan
Fernando Alva-Manchego
Ganitkevitch Juri
Glavaš Goran
Gonzalez-Dios Itziar
Goodfellow Ian
Goto Isao
Guo Han
Heilman Michael
Kajiwara Tomoyuki
Kandula Sasikiran
Kauchak David
Klaper David
Klein Guillaume
Klerke Sigrid
Lin Chin Yew
Lucia Specia
Mandya Angrosh
Mikolov Tomas
Mirkin Shachar
Napoles Courtney
Narayan Shashi
Niklaus Christina
Ogden Charles Kay
Paetzold Gustavo
Paetzold Gustavo H.
Paetzold Gustavo Henrique
Papineni Kishore
Petersen Sarah E.
Post Matt
Quigley S. P.
Ranzato Marc’Aurelio
Robbins N. L.
Scarton Carolina
Scarton Carolina
Shardlow Matthew
Shewan Cynthia M.
Siddharthan Advaith
Siddharthan Advaith
Silveira Sara Botelho
Snover Matthew
Sun Hong
Vaswani Ashish
Vickrey David
Woodsend Kristian
Woodsend Kristian
Wubben Sander
Yatskar Mark
Zhang Xingxing
Zhu Zhemin
Štajner Sanja
Štajner Sanja
Štajner Sanja
Štajner Sanja
Publication venue: 'MIT Press - Journals'
Publication date: 15/09/2019
Field of study

Sentence Simplification (SS) aims to modify a sentence in order to make it easier to read and understand. In order to do so, several rewriting transformations can be performed such as replacement, reordering, and splitting. Executing these transformations while keeping sentences grammatical, preserving their main idea, and generating simpler output, is a challenging and still far from solved problem. In this article, we survey research on SS, focusing on approaches that attempt to learn how to simplify using corpora of aligned original-simplified sentence pairs in English, which is the dominant paradigm nowadays. We also include a benchmark of different approaches on common datasets so as to compare them and highlight their strengths and limitations. We expect that this survey will serve as a starting point for researchers interested in the task and help spark new ideas for future developments

Crossref

Online Research @ Cardiff

Spiral - Imperial College Digital Repository

White Rose Research Online

PARADIGM: Paraphrase Diagnostics through Grammar Matching

Author: Chris Callison-burch
Jonathan Weese
Juri Ganitkevitch
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2014
Field of study

Paraphrase evaluation is typically done ei-ther manually or through indirect, task-based evaluation. We introduce an in-trinsic evaluation PARADIGM which mea-sures the goodness of paraphrase col-lections that are represented using syn-chronous grammars. We formulate two measures that evaluate these paraphrase grammars using gold standard sentential paraphrases drawn from a monolingual parallel corpus. The first measure calcu-lates how often a paraphrase grammar is able to synchronously parse the sentence pairs in the corpus. The second mea-sure enumerates paraphrase rules from the monolingual parallel corpus and calculates the overlap between this reference para-phrase collection and the paraphrase re-source being evaluated. We demonstrate the use of these evaluation metrics on para-phrase collections derived from three dif-ferent data types: multiple translations of classic French novels, comparable sen-tence pairs drawn from different newspa-pers, and bilingual parallel corpora. We show that PARADIGM correlates with hu-man judgments more strongly than BLEU on a task-based evaluation of paraphrase quality.

CiteSeerX

Crossref